food and drug administration
Integrating RCTs, RWD, AI/ML and Statistics: Next-Generation Evidence Synthesis
Yang, Shu, Gamalo, Margaret, Fu, Haoda
Randomized controlled trials (RCTs) have been the cornerstone of clinical evidence; however, their cost, duration, and restrictive eligibility criteria limit power and external validity. Studies using real-world data (RWD), historically considered less reliable for establishing causality, are now recognized to be important for generating real-world evidence (RWE). In parallel, artificial intelligence and machine learning (AI/ML) are being increasingly used throughout the drug development process, providing scalability and flexibility but also presenting challenges in interpretability and rigor that traditional statistics do not face. This Perspective argues that the future of evidence generation will not depend on RCTs versus RWD, or statistics versus AI/ML, but on their principled integration. To this end, a causal roadmap is needed to clarify inferential goals, make assumptions explicit, and ensure transparency about tradeoffs. We highlight key objectives of integrative evidence synthesis, including transporting RCT results to broader populations, embedding AI-assisted analyses within RCTs, designing hybrid controlled trials, and extending short-term RCTs with long-term RWD. We also outline future directions in privacy-preserving analytics, uncertainty quantification, and small-sample methods. By uniting statistical rigor with AI/ML innovation, integrative approaches can produce robust, transparent, and policy-relevant evidence, making them a key component of modern regulatory science.
- North America > United States > New York (0.04)
- North America > Greenland (0.04)
- North America > United States > North Carolina (0.04)
- (8 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- (6 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Prostate-VarBench: A Benchmark with Interpretable TabNet Framework for Prostate Cancer Variant Classification
Tavara, Abraham Francisco Arellano, Kumar, Umesh, Pradeepkumar, Jathurshan, Sun, Jimeng
Variants of Uncertain Significance (VUS) limit the clinical utility of prostate cancer genomics by delaying diagnosis and therapy when evidence for pathogenicity or benignity is incomplete. Progress is further limited by inconsistent annotations across sources and the absence of a prostate-specific benchmark for fair comparison. We introduce Prostate-VarBench, a curated pipeline for creating prostate-specific benchmarks that integrates COSMIC (somatic cancer mutations), ClinVar (expert-curated clinical variants), and TCGA-PRAD (prostate tumor genomics from The Cancer Genome Atlas) into a harmonized dataset of 193,278 variants supporting patient- or gene-aware splits to prevent data leakage. To ensure data integrity, we corrected a Variant Effect Predictor (VEP) issue that merged multiple transcript records, introducing ambiguity in clinical significance fields. We then standardized 56 interpretable features across eight clinically relevant tiers, including population frequency, variant type, and clinical context. AlphaMissense pathogenicity scores were incorporated to enhance missense variant classification and reduce VUS uncertainty. Building on this resource, we trained an interpretable TabNet model to classify variant pathogenicity, whose step-wise sparse masks provide per-case rationales consistent with molecular tumor board review practices. On the held-out test set, the model achieved 89.9% accuracy with balanced class metrics, and the VEP correction yields an 6.5% absolute reduction in VUS.
- North America > Canada (0.05)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > Illinois > Champaign County > Champaign (0.04)
- (2 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions
Liu, Siying, Zhang, Shisheng, Bala, Indu
Large language models (LLMs) are increasingly applied in biomedical domains, yet their reliability in drug-safety prediction remains underexplored. In this work, we investigate whether LLMs incorporate socio-demographic information into adverse event (AE) predictions, despite such attributes being clinically irrelevant. Using structured data from the United States Food and Drug Administration Adverse Event Reporting System (FAERS) and a persona-based evaluation framework, we assess two state-of-the-art models, ChatGPT-4o and Bio-Medical-Llama-3.8B, across diverse personas defined by education, marital status, employment, insurance, language, housing stability, and religion. We further evaluate performance across three user roles (general practitioner, specialist, patient) to reflect real-world deployment scenarios where commercial systems often differentiate access by user type. Our results reveal systematic disparities in AE prediction accuracy. Disadvantaged groups (e.g., low education, unstable housing) were frequently assigned higher predicted AE likelihoods than more privileged groups (e.g., postgraduate-educated, privately insured). Beyond outcome disparities, we identify two distinct modes of bias: explicit bias, where incorrect predictions directly reference persona attributes in reasoning traces, and implicit bias, where predictions are inconsistent, yet personas are not explicitly mentioned. These findings expose critical risks in applying LLMs to pharmacovigilance and highlight the urgent need for fairness-aware evaluation protocols and mitigation strategies before clinical deployment.
- North America > United States (0.69)
- Oceania > Australia > South Australia > Adelaide (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Navigating the EU AI Act: Foreseeable Challenges in Qualifying Deep Learning-Based Automated Inspections of Class III Medical Devices
Diaz, Julio Zanon, Brennan, Tommy, Corcoran, Peter
As deep learning (DL) technologies advance, their application in automated visual inspection for Class III medical devices offers significant potential to enhance quality assurance and reduce human error. However, the adoption of such AI-based systems introduces new regulatory complexities-particularly under the EU Artificial Intelligence (AI) Act, which imposes high-risk system obligations that differ in scope and depth from established regulatory frameworks such as the Medical Device Regulation (MDR) and the U.S. FDA Quality System Regulation (QSR). This paper presents a high-level technical assessment of the foreseeable challenges that manufacturers are likely to encounter when qualifying DL-based automated inspections -- specifically static models -- within the existing medical device compliance landscape. It examines divergences in risk management principles, dataset governance, model validation, explainability requirements, and post-deployment monitoring obligations. The discussion also explores potential implementation strategies and highlights areas of uncertainty, including data retention burdens, global compliance implications, and the practical difficulties of achieving statistical significance in validation with limited defect data. Disclaimer: This paper presents a technical perspective and does not constitute legal or regulatory advice.
- Oceania > Australia (0.28)
- Asia > India (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
- (11 more...)
- Law > Statutes (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Health Care Technology (1.00)
- (4 more...)
Generating Reliable Adverse event Profiles for Health through Automated Integrated Data (GRAPH-AID): A Semi-Automated Ontology Building Approach
Gadusu, Srikar Reddy, Callahan, Larry, Lababidi, Samir, Nishtala, Arunasri, Healey, Sophia, McGinty, Hande
As data and knowledge expand rapidly, adopting systematic methodologies for ontology generation has become crucial. With the daily increases in data volumes and frequent content changes, the demand for databases to store and retrieve information for the creation of knowledge graphs has become increasingly urgent. The previously established Knowledge Acquisition and Representation Methodology (KNARM) outlines a systematic approach to address these challenges and create knowledge graphs. However, following this methodology highlights the existing challenge of seamlessly integrating Neo4j databases with the Web Ontology Language (OWL). Previous attempts to integrate data from Neo4j into an ontology have been discussed, but these approaches often require an understanding of description logics (DL) syntax, which may not be familiar to many users. Thus, a more accessible method is necessary to bridge this gap. This paper presents a user-friendly approach that utilizes Python and its rdflib library to support ontology development. We showcase our novel approach through a Neo4j database we created by integrating data from the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) database. Using this dataset, we developed a Python script that automatically generates the required classes and their axioms, facilitating a smoother integration process. This approach offers a practical solution to the challenges of ontology generation in the context of rapidly growing adverse drug event datasets, supporting improved drug safety monitoring and public health decision-making.
- Europe > Austria > Vienna (0.14)
- North America > United States > Kansas > Riley County > Manhattan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.04)
Systems-Theoretic and Data-Driven Security Analysis in ML-enabled Medical Devices
Mitra, Gargi, Hallajiyan, Mohammadreza, Kim, Inji, Dharmalingam, Athish Pranav, Elnawawy, Mohammed, Iqbal, Shahrear, Pattabiraman, Karthik, Alemzadeh, Homa
The integration of AI/ML into medical devices is rapidly transforming healthcare by enhancing diagnostic and treatment facilities. However, this advancement also introduces serious cybersecurity risks due to the use of complex and often opaque models, extensive interconnectivity, interoperability with third-party peripheral devices, Internet connectivity, and vulnerabilities in the underlying technologies. These factors contribute to a broad attack surface and make threat prevention, detection, and mitigation challenging. Given the highly safety-critical nature of these devices, a cyberattack on these devices can cause the ML models to mispredict, thereby posing significant safety risks to patients. Therefore, ensuring the security of these devices from the time of design is essential. This paper underscores the urgency of addressing the cybersecurity challenges in ML-enabled medical devices at the pre-market phase. We begin by analyzing publicly available data on device recalls and adverse events, and known vulnerabilities, to understand the threat landscape of AI/ML-enabled medical devices and their repercussions on patient safety. Building on this analysis, we introduce a suite of tools and techniques designed by us to assist security analysts in conducting comprehensive premarket risk assessments. Our work aims to empower manufacturers to embed cybersecurity as a core design principle in AI/ML-enabled medical devices, thereby making them safe for patients.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
- North America > United States > Massachusetts (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- (6 more...)
Diachronic and synchronic variation in the performance of adaptive machine learning systems: The ethical challenges
Hatherley, Joshua, Sparrow, Robert
Leveraging this'adaptive' potential of medical ML could generate significant benefits for patient health and well-being. Recent engagements with the ethical issues generated by the use of adaptive ML systems in medicine have typically been limited to discussions of'the update problem': how should systems that continue to change and evolve post-regulatory approval be regulated? In this paper, we draw attention to an important set of ethical issues raised by the use of adaptive machine learning systems in medicine that have, thus far, been neglected and are highly deserving of further attention. Discussions of adaptive machine learning systems to date have overlooked the distinction between two sorts of variance that such systems may exhibit -- diachronic evolution (change over time) and synchronic variation (difference between cotempo-raneous instantiations of the algorithmic system at different sites) -- and underestimated the significance of the latter. Both diachronic evolution and synchronic variation will complicate the hermeneutic task of clinicians in interpreting the outputs of AI systems, and will therefore pose significant challenges to the process of securing informed consent to treatment. Equity issues may occur where synchronic variation is permitted, as the quality of care may vary significantly across patients or between hospitals. However, the decision as to whether to allow or eliminate synchronic variation involves complex trade-offs between accuracy and generalisability, as well as a number of other values, including justice and non-maleficence. In some contexts, preventing synchronic variation from emerging may only be possible at the expense of the wellbeing, and the quality of care available to, particular patients or classes of patients. Designers and regulators of adaptive ML systems will need to confront these issues if the potential benefits of adaptive ML in medical care are to be realised.
- North America > United States (0.16)
- Oceania > Australia (0.14)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Research Report > Experimental Study (0.68)
- Instructional Material > Course Syllabus & Notes (0.46)
Show, Don't Tell: Learning Reward Machines from Demonstrations for Reinforcement Learning-Based Cardiac Pacemaker Synthesis
Komp, John, Srinivas, Dananjay, Pacheco, Maria, Trivedi, Ashutosh
An (artificial cardiac) pacemaker is an implantable electronic device that sends electrical impulses to the heart to regulate the heartbeat. As the number of pacemaker users continues to rise, so does the demand for features with additional sensors, adaptability, and improved battery performance. Reinforcement learning (RL) has recently been proposed as a performant algorithm for creative design space exploration, adaptation, and statistical verification of cardiac pacemakers. The design of correct reward functions, expressed as a reward machine, is a key programming activity in this process. In 2007, Boston Scientific published a detailed description of their pacemaker specifications. This document has since formed the basis for several formal characterizations of pacemaker specifications using real-time automata and logic. However, because these translations are done manually, they are challenging to verify. Moreover, capturing requirements in automata or logic is notoriously difficult. We posit that it is significantly easier for domain experts, such as electrophysiologists, to observe and identify abnormalities in electrocardiograms that correspond to patient-pacemaker interactions. Therefore, we explore the possibility of learning correctness specifications from such labeled demonstrations in the form of a reward machine and training an RL agent to synthesize a cardiac pacemaker based on the resulting reward machine. We leverage advances in machine learning to extract signals from labeled demonstrations as reward machines using recurrent neural networks and transformer architectures. These reward machines are then used to design a simple pacemaker with RL. Finally, we validate the resulting pacemaker using properties extracted from the Boston Scientific document.
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > Canada > Ontario > Hamilton (0.04)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)
Confidence intervals uncovered: Are we ready for real-world medical imaging AI?
Christodoulou, Evangelia, Reinke, Annika, Houhou, Rola, Kalinowski, Piotr, Erkan, Selen, Sudre, Carole H., Burgos, Ninon, Boutaj, Sofiène, Loizillon, Sophie, Solal, Maëlys, Rieke, Nicola, Cheplygina, Veronika, Antonelli, Michela, Mayer, Leon D., Tizabi, Minu D., Cardoso, M. Jorge, Simpson, Amber, Jäger, Paul F., Kopp-Schneider, Annette, Varoquaux, Gaël, Colliot, Olivier, Maier-Hein, Lena
Medical imaging is spearheading the AI transformation of healthcare. Performance reporting is key to determine which methods should be translated into clinical practice. Frequently, broad conclusions are simply derived from mean performance values. In this paper, we argue that this common practice is often a misleading simplification as it ignores performance variability. Our contribution is threefold. (1) Analyzing all MICCAI segmentation papers (n = 221) published in 2023, we first observe that more than 50% of papers do not assess performance variability at all. Moreover, only one (0.5%) paper reported confidence intervals (CIs) for model performance. (2) To address the reporting bottleneck, we show that the unreported standard deviation (SD) in segmentation papers can be approximated by a second-order polynomial function of the mean Dice similarity coefficient (DSC). Based on external validation data from 56 previous MICCAI challenges, we demonstrate that this approximation can accurately reconstruct the CI of a method using information provided in publications. (3) Finally, we reconstructed 95% CIs around the mean DSC of MICCAI 2023 segmentation papers. The median CI width was 0.03 which is three times larger than the median performance gap between the first and second ranked method. For more than 60% of papers, the mean performance of the second-ranked method was within the CI of the first-ranked method. We conclude that current publications typically do not provide sufficient evidence to support which models could potentially be translated into clinical practice.
- Europe > United Kingdom > England > Greater London > London (0.05)
- North America > Canada (0.05)
- North America > United States > Maryland > Montgomery County > Rockville (0.04)
- (3 more...)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Congressman slams FDA for ignoring 'troubling evidence' about Elon Musk's Neuralink and allowing brain chip to be implanted in humans - despite botching experiments on monkeys
Lawmakers have slammed the Food and Drug Administration for ignoring'troubling evidence' of Elon Musk's Neuralink practices and pushing the brain chip to human trials. Rep. Earl Blumenauer (D-Oregon) penned a letter to the FDA, criticizing the agency for not expecting the company's long list of animal abuse allegations that span back to at least 2019. The Democrat cited 2022 reports that described employees' complaints of'hack jobs' of animal experiments due to a rushed schedule, causing needless suffering and deaths. The open letter also stated'these alleged failures to follow standard operating procedures potentially endangered animal welfare and compromised data collection for human trials.' Blumenauer is now demanding the FDA explain how it reconciled reports of such lapses with its decision to authorize Neuralink's human trial.
- North America > United States > Oregon (0.26)
- North America > United States > California (0.05)